Chemist versus Machine: Traditional Knowledge versus Machine Learning Techniques

نویسندگان

چکیده

In the past, traditional chemical heuristics have been very important for discovery of new materials. Machine learning approaches started to replace those in recent years, and they offer opportunities materials science. Both are strongly interconnected.Classical typically rely on less data than machine approaches. There two different types science: one relies features inspired by classical heuristics, other purely relationships within analyzed data.The growing amount offers an opportunity test heuristics. combination with techniques, also new, more data-driven should be developed. Chemical fundamental advancement chemistry These established scientists using knowledge creativity extract patterns from limited datasets. perfect this approach computers larger Here, we discuss between We show how rules can challenged large-scale statistical assessment concepts commonly used as feeding techniques. stress waste involved relearning challenges terms size requirements Our view is that heuristic at their best when work together. Data science (see Glossary), artificial intelligence, nowadays present all fields technology, including The impact these techniques expected large, leading a path towards scientific (sometimes called fourth paradigm science) [1.Agrawal A. Choudhary Perspective: informatics big data: realization “fourth paradigm” science.APL Mater. 2016; 4053208Crossref Scopus (425) Google Scholar,2.Hey T. et al.The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009Google Scholar]. will not review here progress future directions use science; refer interested readers reviews topic (e.g., [3.Schmidt J. al.Recent advances applications solid-state science.NPJ Comput. 2019; 5: 1-36Crossref (594) Scholar, 4.Butler K.T. al.Machine molecular science.Nature. 2018; 559: 547-555Crossref PubMed (1218) 5.Deringer V.L. interatomic potentials emerging tools science.Adv. 311902765Crossref (160) 6.Schleder G.R. al.From DFT learning: science–a review.J. Phys. 2032001Crossref (225) Scholar]). Instead, personal perspective heavily relying sophisticated algorithms large sets, compete, complement, challenge, and/or benefit Since early days chemistry, looked often sets data. This led many today’s widely such periodic system elements, electronegativities, atomic radii. models were built combining scientist simplified physical pictures. For example, Pettifor introduced completely scale enabled separation structure AB compounds 2D map. was based size, valence, electronegativity constituting atoms [7.Pettifor D.G. A crystal-structure maps.Solid State Commun. 1984; 51: 31-34Crossref (171) call or throughout article (Figure 1, Key Figure). Recently, has replaced only due improvements methods now available through open source software but ever-increasing mainly high-throughput ab initio computations [8.Jain al.Commentary: project: genome accelerating innovation.APL 2013; 1011002Crossref (3051) 9.Draxl C. Scheffler M. NOMAD laboratory: sharing intelligence.J. 2036001Crossref (83) 10.Curtarolo S. al.AFLOWLIB.ORG: distributed properties repository calculations.Comput. Sci. 2012; 58: 227-235Crossref (572) 11.Álvarez-Moreno al.Managing computational problem: IoChem-BD platform.J. Chem. Inf. Model. 2015; 55: 95-103Crossref (258) divided into categories. first uses radii electronegativity) inputs provide relationship material properties. brings descriptors. instance, find mathematical well-known ionization potential) discriminate wurtzite rock salt forming binaries [12.Ghiringhelli L.M. al.Big critical role descriptor.Phys. Rev. Lett. 114105503Crossref (467) second derived bypassing probability certain ions each manner [13.Hautier G. al.Data mined ionic substitutions compounds.Inorg. 2011; 50: 656-663Crossref (235) No physical/chemical feature radius implied study, idea some likely others. three include extraction learning, summarized Figure 1. require inherently biased preconception could turn out incorrect. Conversely, set sizes sometimes available. Many still taught general courses date back least century ago. concept proposed Avogadro Berzelius ~1809 [14.Jensen W.B. Electronegativity Pauling: part 1: origins concept.J. Educ. 1996; 73: 11Crossref Scholar], oxidation states Wöhler (~1835) [15.Karen P. Oxidation state, long-standing issue!.Angew. Int. Ed. 54: 4716-4726Crossref (62) Loschmidt (~1866) [16.Rahm al.Atomic elements 1-96.Chem. Eur. 22: 14625-14632Crossref (134) table dates 150 years Mendelejew [17.Mendelejew D. Über die Beziehungen der Eigenschaften zu den Atomgewichten Elemente.Z. 1869; 12: 405-406Google Goldschmidt Pauling stability crystal structures 90 ago [18.Pauling L. principles determining complex crystals.J. Am. Soc. 1929; 1010Crossref (1394) Scholar,19.Goldschmidt V.M. Die Gesetze Krystallochemie.Naturwissenschaften. 1926; 14: 477-485Crossref (1895) Without any doubt, instrumental sciences over last 100 years. However, must careful blindly. Their historical importance exclude them assessment. Rahm colleagues recently shown our most change drastically high pressures [20.Rahm al.Squeezing table: electron configuration under compression.J. 141: 10253-10271Crossref (73) face common issue extrapolation. Li changes its valence becomes p group element 300 gigapascal (GPa). K heavier alkali metals become transitional metals. Furthermore, Na electropositive s1 element. Thus, reactivities rising pressure. Similar typical states. might transferability, needing adapted allow exploration extreme conditions. Even conditions, well-established powerful evaluated modern sets. Hautier [21.George predictive power rules.Angew. 2020; 59: 7569-7575Crossref (21) Scholar] rules, which connects coordination environments stability. rule links preferred environment cation cation–anion ratio 2A ). analysis oxides shows Pauling’s fulfilled 66% tested local environments, deviations alkali–earth chemistries, instance 2A). Strikingly, despite being corner stone 13% 5000 fulfil four performed information technology. First, diverse oxide had determined crystallographers made readily databases Inorganic Crystal Structure Database (ICSD), Open Crystallographic Database, Cambridge Structural [22.Groom C.R. structural database.Acta Cryst B. 72: 171-179Crossref (5369) 23.Zagorac developments inorganic database: theoretical related features.J. Appl. Crystallogr. 52: 918-925Crossref (76) 24.Gražulis al.Crystallography – open-access collection structures.J. 2009; 42: 726-729Crossref (811) efforts crystallographic community, searches would possible. automatic determination developed [25.Waroquiers al.Statistical oxides.Chem. 2017; 29: 8346Crossref (71) Scholar,26.Waroquiers al.ChemEnv: fast robust identification tool.Acta 76: 683-695Crossref (4) quantitative up world unprecedented access automation, it staggering realize several hundred his no computer process [27.Behrens H. import validation database.J. Res. Natl. Inst. Stand. Technol. 101: 365Crossref (7) similar spirit challenging Filip Giustino modified improved tolerance factor relates building perovskite perovskites 2B) [28.Filip M.R. F. geometric blueprint perovskites.Proc. Acad. U. 115: 5397-5402Crossref (67) tightly packed spheres. To improve basic factor, included further constraints prediction initially missing definition Goldschmidt, structures. predict stable fidelity (80%). Almost needless say, dataset much 1926. Based predicted 000 hitherto unknown perovskites. initial development both computing processing grown substantially. Now time community rigorously. includes assessing validity conditions pressure temperature) developing following previous updated Advancing along arrow another popular descriptors studies 3) represented electronegativities radii, link trained get examples could, start radii) search deviate empirical rules. Various band gap, bulk shear modulus, vibrational successfully learned way composition properties, including, if necessary, features. researchers entropy inferred features, numbers, position table, scales, 4A ) [29.Legrain al.How alone free energies entropies solids.Chem. 6220-6227Crossref (64) Scholar,30.Tawfik S.A. al.Predicting thermal crystals learning.Adv. Theor. Simul. 31900208Crossref (12) Scholar].Figure 4Importance Descriptor Learning.Show full caption(A) figure mean absolute error Five tested: composition, elemental atoms, pair correlation functions (PCF), solid angles defined O’Keeffe, bispectrum components density around atom. Reprinted, permission, Copyright 2017 American Society. (B) Energy differences zinc blend binary function optimal descriptor identified clearly separates preferring type plot. licensed CC BY 3.0 license.View Large Image ViewerDownload (PPT) (A) license. While descriptions solely norm earlier days, there elegantly incorporate information. graph-based representations promising results [31.Xie Grossman J.C. graph convolutional neural networks accurate interpretable properties.Phys. 120145301Crossref (487) Scholar,32.Chen al.Graph universal framework molecules crystals.Chem. 31: 3564-3572Crossref (210) allowed formation energy gaps) variety compositions representation nodes bonds edges graph. Additional atom partially electronegativities) node edge vectors. On top graph, network allows target way, Chen MatErials Graph Networks (MEGNet) represent [32.Chen abundance easily accessible codes (matminer [33.Ward al.Matminer: toolkit mining.Comput. 152: 60-69Crossref (209) adequate property. help choosing motivate focus stability), possible only. select relevant ones. task implemented Scholar,34.Ghiringhelli al.Learning compressed sensing.New 19023017Crossref (72) Scholar,35.Ouyang R. al.SISSO: compressed-sensing method identifying low-dimensional immensity offered candidates.Phys. 2083802Crossref (184) Work Ghiringhelli illustrates selecting right performing classification blende formers. Using shrinkage selection operator (LASSO), helps pool authors discovered simple 4B). found potentials, affinities, describing where radial s orbitals maximum effective 4B selected property relatively simple. Feature efficiently combined less-interpretable feed-forward graphs [36.De Breuck P.-P. small datasets.arXiv. (Published online April 30, 2020. arXiv:2004.14766 [cond-mat])Google (around hundreds points). third directly raw without intermediate step atomistic like description (position sites) strategy appealing does priori what predictive, usually requires 1). An interesting example comes field prediction. groups chemists foreseen substitute model quantify cations Essentially, builds providing likelihood other. assumption (ionic radius, electronegativity, table) drives substitutions. substitution given matrix 5. algorithm functional theory (DFT) Li-ion battery, luminescent, ternary nitride [37.Wang Z. al.Mining unexplored chemistries phosphors high-color-quality white-light-emitting diodes.Joule. 2: 914-926Abstract Full Text PDF 38.Chen al.Carbonophosphates: family cathode batteries computationally.Chem. 24: 2009-2016Crossref (112) 39.Sun W. al.A map metal nitrides.Nat. 18: 732-739Crossref (140) confirmed experimentally. Some surprising rationalizable standard grounds. suggested Ba2+ Sr2+, Re7+ Al3+, N3– O2– form phosphor host material, Sr2LiAlO4, synthesized Another replacing [5.Deringer Scholar,40.Bartók A.P. al.Gaussian approximation potentials: accuracy quantum mechanics, electrons.Phys. 2010; 104136403Crossref (1074) Scholar,41.Behler Parrinello Generalized neural-network high-dimensional potential-energy surfaces.Phys. 2007; 98146401Crossref (1550) Traditional force-fields Lennard–Jones [42.Jones J.E. Chapman fields. —II. From equation state gas.Proc. Math. Eng. 1924; 106: 463-477Crossref embedded [43.Daw M.S. embedded-atom method: applications.Mater. Rep. 1993; 9: 251-310Crossref (1227) strong assumptions embedded. Recently bypass connecting energetics forces about relationships. sense, minimal assumptions. fit reference DFT) needed. differ encode regression. Popular ones Gaussian Bartók Csányi, ‘smooth overlap positions’ (SOAP) used, neuronal Behler, centered symmetry [40.Bartók Scholar,44.Bartók al.On representing environments.Phys. 87184115Crossref (825) Scholar,45.Behler Atom-centered constructing potentials.J. 134074106Crossref (636) SOAP encodes geometries expansion smeared densities. possibilities immense, allowing simulations phenomena requiring grain boundaries [46.Hu al.Genetic algorithm-guided deep boundary diagrams: addressing challenge five degrees freedom.Mater. Today. 38: 49-57Crossref Scholar,47.Yokoi al.Neural-network potential silicon.Phys. 4014605Crossref (15) amorphous phases [48.Sosso G.C. al.Neural phase GeTe.Phys. 85174103Crossref (153) Scholar,49.Deringer Csányi carbon.Phys. 95094203Crossref (286) Scholar]) [50.Deringer al.Data-driven structures.Faraday Discuss. 211: 45-59Crossref It already investigation battery thermoelectric [51.Deringer Modelling understanding machine-learning-driven simulations.J. Energy. 2041003Crossref (28) Scholar,52.George al.Combining phonon transferability models.J. 153044104Crossref (11) attractive because do inadequate, keep mind simply relearn illustrative water-splitting among ABX3 [53.Jain al.Performance genetic water splitting perovskites.J. 48: 6519Crossref (32) algorithm, performed, focusing gap edges. 6, blue line, algorithm). same set, rules: sum 0, even number electrons primitive unit cell considered avoid ranking similarly successful red chemical). Obviously, here, ‘reinventing wheel’ precisely known charge balance. Interestingly, optimization lead performances. Joining together prevents reinvention wheel while capturing green line). One mentioned well enables researcher decide (from numbers bonding inclusion heuristics) depending availability rooted desire build explaining Instead seeing competitors, think instead feed old natural Outstanding Questions). benchmarks studies. Researchers encouraged validate cross-validation compare performance intelligence opinion, provides true added value convincingly beats If outcome balance, question advance achieved tool. significance preclude fact, remember datasets might, therefore, highly biased. boom technology presents drawing board design approaches.Outstanding QuestionsWill intuitive profit heuristics? university core understanding. contribution heuristics.Are current Most serendipity Therefore, and, assessments during discovery.Should frameworks easy testing against True beat Currently, straightforward ways Will Are discovery. Should J.G. acknowledges funding European Union’s Horizon 2020 research innovation programme Marie Sk?odowska-Curie grant agreement No. 837910. thank Pierre-Paul De helpful comments manuscript. 4 created VESTA [54.Momma K. Izumi 3 three-dimensional visualization crystal, volumetric morphology data.J. 44: 1272-1276Crossref (10969) ‘intelligent’ behavior machines. subfield intelligence. strategies guide solving problems making decisions. (and properties) connections latter consists pooling layers, produce overall vector convoluting vectors neighboring atoms. focuses amounts closely mining interchangeably. indicates structure. regression penalty L1-norm, ensures variables analyze therefore structure–stability relationship; connect stability/existence connected.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning versus Knowledge Based Classification of Legal Texts

This paper presents results of an experiment in which we used machine learning (ML) techniques to classify sentences in Dutch legislation. These results are compared to the results of a pattern-based classifier. Overall, the ML classifier performs as accurate (>90%) as the pattern based one, but seems to generalize worse to new laws. Given these results, the pattern based approach is to be pref...

متن کامل

Statistical versus knowledge-based machine translation

over, and what kinds of symbol systems should we create for them? Everytime a new phenomenon is identified as a bottleneck or as problematic, the very actsof describing the phenomenon, defining it,and creating a set of symbols to represent its abstractions are symbolic (in both senses of the word!). The benefits: decreasedlearning time and more powerful rules,hence improved ...

متن کامل

Man versus Machine versus Ribozyme

Primer The steam drill was on the right hand side, John Henry was on the left, Says before I let this steam drill beat me down, I'll hammer myself to death " —The Ballad of John Henry (American, traditional) O rganisms and molecules achieve evolutionary success via many possible paths, what evolutionary biologists sometimes call the " tempo and mode " of evolution [1]. In contrast, engineered m...

متن کامل

Appendix : Machine Learning Bias Versus Statistical Bias

is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...

متن کامل

Appendix : Machine Learning Bias Versus Statistical Bias

is if and 0 if. This high variance may help to explain why there is selection pressure for weak (machine learning) bias when the (machine learning) bias correctness is low. The reason that statisticians are interested in (statistical) bias and variance is that squared error is equal to the sum of squared (statistical) bias and variance. Therefore minimal (statistical) bias and minimal variance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Trends in chemistry

سال: 2021

ISSN: ['2589-5974', '2589-7209']

DOI: https://doi.org/10.1016/j.trechm.2020.10.007